A skip-list approach for efficiently processing forecasting queries
نویسندگان
چکیده
Time series data is common in many settings including scientific and financial applications. In these applications, the amount of data is often very large. We seek to support prediction queries over time series data. Prediction relies on model building which can be too expensive to be practical if it is based on a large number of data points. We propose to use statistical tests of hypotheses to choose a proper subset of data points to use for a given prediction query interval. This involves two steps: choosing a proper history length and choosing the number of data points to use within this history. Further, we use an I/O conscious skip list data structure to provide samples of the original data set. Based on the statistics collected for a query workload, which we model as a probability mass function (PMF) over query intervals, we devise a randomized algorithm that selects a set of pre-built models (PM’s) to construct, subject to some maintenance cost constraint when there are updates. Given this set of PM’s, we discuss interesting query processing strategies for not only point queries, but also range, aggregation, and JOIN queries. We conduct a comprehensive empirical study on real world datasets to verify the effectiveness of our approaches and algorithms.
منابع مشابه
I/O Efficient Search of Large Social Networks
We introduce an I/O efficient algorithm and data structure to support fast decentralized search in large graphs modeling social networks. We structure network data in a homophily-based social hierarchy using an append-only, block-aligned skip list with an embedded tree microindex, which reduces I/O and cache line faults. We further minimize I/O when building the skip list by combining an extend...
متن کاملRange queries over skip tree graphs
The support for complex queries, such as range, prefix and aggregation queries, over structured peer-to-peer systems is currently an active and significant topic of research. This paper demonstrates how Skip Tree Graph, as a novel structure, presents an efficient solution to that problem area through provision of a distributed search tree functionality on decentralised and dynamic environments....
متن کاملRange-capable Distributed Hash Tables
In this paper, we present a novel indexing data structure called RDHT (Range capable Distributed Hash Table) derived from skip lists and specifically designed for storing and retrieving geographic data from a structured P2P network overlay. We have developed RDHTs as backend for the DART search engine, whose goal is to efficiently answer complex queries based on semantics and geographical conte...
متن کاملA Service-oriented Scalable Dictionary in MPI
In this paper we present a distributed, in-memory, message passing implementation of a dynamic ordered dictionary structure. The structure is based on a distributed fine-grain implementation of a skip list that can scale across a cluster of multicore machines. We present a service-oriented approach to the design of distributed data structures in MPI where the skip list elements are active proce...
متن کاملSimplified Self-Adapting Skip Lists
The Simplified Self-Adapting Skip List, a practical new extension of the Skip List data structure, is designed for use with data that exhibit bias, that is, a nonuniform distribution of queries to set elements. The structure observes an initially unknown degree of bias in queries to a data set and adapts itself to a consistently nearly-optimal configuration, improving search efficiency and spee...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 1 شماره
صفحات -
تاریخ انتشار 2008